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FOREWORD 


This  report  was  prepared  by  the  Psychology  Branch  of  the  Aerospace 
Medical  Laboratory,  Directorate  of  Laboratories,  under  Research  and 
Development  Project  7184,  Task  71581,  with  John  P.  Hornselh  acting  as 
Task  Scientist.  The  research  reported  herein  was  conducted  by  the 
author  at  Antioch  College,  using  the  facilities  of  Contract  No. 

AF  33(616)-3404,  The  author  is  indebted  to  Darwin  P.  Hunt, 

Charles  A.  Baker,  and  Melvin  J.  Warrick  for  very  helpful  suggestions 
resulting  from  a  critical  review  of  early  drafts  of  the  report. 


ABSTRACT 


* 

The  compatibility  of  typical  psychological  measurements  with  Lhe 
assumptions  of  common,  parametric  ,  statistical  tests  is  examined. 
Empirically  obtained  distributions  of  time  scores  and  mathematically 
derived  error  distributions  are  used  to  illustrate  conditions  which  give 
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INTRODUCTION 


This  is  the  first  in  a  series  of  studies  designed  to  give  some  concrete, 
quantitative  indication  of  the  degradation  of  statistical  precision 
accompanying  violation  of  parametric  assumptions.  By  showing  the  effect  of 
specific  violations  and  the  efficacy  of  certain  "remedies"  for  them  in 
certain  completely  defined  cases,  it  is  hoped  to  provide  some  of  the 
perspective  necessary  for  the  proper  selection  and  use  of  a  statistical 
test,  whether  it  be  parametric  or  distribution-free. 

The  belief  appears  to  be  widespread  that,  while  mild  violations  of 
parametric  assumptions  are  common  enough,  extreme  violations  are  quite  rare, 
so  much  so,  in  fact,  that  one  need  hardly  concern  himself  with  their 
eventuality.  There  is  little  point  in  discussing  the  statistical  effect 
of  violations  until  the  extent  of  their  occurrence  is  appreciated.  The 
first  report  in  this  series,  therefore,  will  have  the  very  limited 
objective  of  presenting  "data",  i.e.  distributions,  containing  serious 
violations  of  assumptions  and  demonstrating  how  naturally  and  logically 
such  violations  can  occur.  Measurements  typical  of  research  in  the  area 
of  experimental  psychology  will  be  used,  namely  time  scores  and  errors. 

The  distributions  to  be  presented  are  distributions  of  scores  for  a 
single  subject.  Their  relevance  to  multi-subject  experiments  will  be 
treated  in  the  Discussion  section. 

VIOLATIONS  OF  ASSUMPTIONS  BY  TIME  SCORES 

In  a  recent  experiment,  reported  elsewhere  (1)  in  detail,  subjects 
were  lequired  to  reach  through  a  constant  distance  and  operate  the  middle 
push  button  in  a  closely  spaced  array  of  three.  Manipulated  variables 
were  the  diameter  and  spacing  of  the  push  buttons  and  the  orientation  of 
the  linear  array;  performance  measures  were  reach-and-operation  time,  and 
"errors",  i.e,  frequency  of  inadvertent  contact  with  the  adjacent  push 
buttons  (which  the  subject  was  instructed  to  avoid  touching). 
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In  order  to  check  the  Influence  of  experimental  conditions  upon  the 
distribution  of  performance  measures  (and  therefore  to  check  the  validity 
of  the  parametric  assumptions  of  normality  and  homogeneity)  extensive 
distributions  of  time-scores  were  obtained,  for  a  single  subject,  under 
each  of  two  of  the  conditions  investigated  in  the  experiment  outlined 
above.  The  subject  was  a  full-time  graduate  assistant  at  an  experimental 
laboratory.  He  had  had  extensive  experience  in  running  subjects  in 
psychological  experiments  and  had  participated  as  subject  in  many  such 
experiments,  some  of  which  were  conducted  by  the  writer.  His  general 
level  and  pattern  of  performance  were  therefore  known;  the  level  was,  in 
fact,  high  and  the  pattern,  consistent.  His  motivation  also  was  high; 
the  subject  apparently  regarded  every  experiment  in  which  he  participated 
as  a  challenge.  He  was,  in  short,  a  "good"  subject.  Figure  1  shows  a 
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Figure  1.  Nearly  Normal  Distribution  of  Reaction  Times  Generated 

by  same  Subject  Producing  Distributions  I  and  II  (to  follow). 
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very  nearly  normal  distribution  of  Lime  scores  generated  by  this  subject 
in  a  difiersn  anc  later  experiment.  It  is  presented  as  suggestive 
evidence  that  the  nonnormalities  of  the  distributions  to  be  reported  in 
the  present  experiment  are  "attributable’'  to  the  experimental  conditions 
rather  than  to  unique  properties  of  the  subject. 


A  single  condition  from  the  push  button  experiment  was  selected  and 
Lht.  st  bject  was  given  2520  trials  under  that  condition.  These  trials 
were  administered  in  six  experimental  sessions  of  420  trials  each,  each 
session  being  conducted  on  a  different  day  and  being  interrupted  by  a 
five-minute  break  after  the  210th  trial.  Subsequently,  this  entire 
procedure  was  repeated  for  a  different  experimental  condition,  using  the 
same  subject  but  a  different  experimenter.  Evidences  of  sequential 
effects  were  checked  by  Cox  and  Stuart's  S2  sign  test  for  trend  (2)  and, 
though  small,  were  found  to  be  highly  statistically  significant. 

The  first  distribution  obtained,  Distribution  I,  is  shown  in 
Figure  2.  The  distribution  was  markedly  bimodal  with  the  proportion  of 
time  scores  accompanied  by  errors  being  much  greater  for  the  second  mode 
than  for  the  j-irst.  It  was  hypothesized,  therefore,  that:  (a)  the  first 
mode  represented  the  short  operation  time  made  possible  by  a  direct  hit 
upon  the  push  button  "target"  with  the  first  thrust  of  the  forefinger, 
the  relatively  few  accompanying  errors  being  caused  by  the  "avoided"  push 
buttons  being  contacted  simultaneously  by  other  portions  of  the  subject's 
hand  (or  by  the  forefinger  subsequent  to  operation  of  the  center  button); 

(b)  the  virtually  scoreless  trough  between  modes  represented  the  time 
consumed  in  thrusting  the  finger  at  the  target,  missing  it  and  perhaps 
striking  the  chassis  or  an  avoided  push  button,  then  withdrawing  and 
repoising  the  finger  for  a  second  thrust;  (c)  the  second  mode 
represented  the  time  consumed  in  (b)  plus  the  additional  time  required, 
after  repoising  the  finger,  to  make  a  second,  successful,  thrust.  (A 
miss  on  the  first  thrust  did  not  necessarily  result  in  an  error,  i.e. 
contact  with  an  avoided  button;  and  the  subject  could,  and  did  occasionally, 
miss  more  than  once  during  a  trial). 
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FREQUENCY  OF  OCCURRENCE  0*  Time 
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Figure  2.  Distribution  I 


In  order  to  check  this  hypothesis  a  second  distribution  of  2520 
scores.  Distribution  II.  was  obtained  lor  the  sas,,  subject  in  precisely 
the  same  way  except  that  the  diameter  of  the  push  buttons  was  increased 
from  1/2  inch  to  1  inch  and  a  different  experimenter  served  as  recorder. 
It  was  expected  that  increasing  the  site  of  the  target  would  reduce  the 
proportion  of  misses  and  therefore  reduce  the  proportionate  size  of  the 
second  mode.  In  order  to  obtain  exact  rather  than  approximate 
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infurmacion  as  to  which  crisis  required  core  then  one  thrust,  the  subject 

-ep^rted  this  information  and  the  experimenter  recorded  it  for  every 

trial .  The  second  mode  was  vestigial,  appearing  as  a  very  slightly  humped 

ong  positive  tail,  practically  all  of  whoae  scores  corresponded  to  trials 

in  which  the  target  was  missed  on  the  first  thrust.  (See  Figure  3  )  The 

i-ypo thesis  enunciated  in  the  preceding  paragraph  was  therefore  considered 
as  substantiated. 


ONE  SUBJECTS  DISTRIBUTION  (P.520  SCORES) 
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Figure  3.  Distribution  II 


(Note  change  of  scale:  Maximum  frequency  is  over  twice  that  for 
Distribution  I,  and  time-score  range  is  smaller.) 
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Because  of  the  large  number  of  scores  upon  which  they  are  based, 
Distr ibutions  I  and  II  may  be  regarded  as  reproducing  the  essential  forms 
of  the  time-score  populations  associated  with  the  respective  experimental 
conditions  under  which  they  were  obtained.  Not  only  are  they  decidedly 
nonnormal  (see  Figures  4  and  5),  but  their  shapes,  especially  the  second 
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Figure  4.  Distribution  I  (histogram)  and  Normal  Distribution 
with  same  Mean,  Variance  and  Area 

mode  have  proved  to  be  a  function  of  an  experimental  variable,  namely 
diameter,  thus  virtually  assuring  heterogeneity  of  variance,  (Variances, 
in  fact,  are  338.33  and  146.91  respectively  and  therefore  are  clearly 
heterogeneous , )  The  assumptions  common  to  most  parametric  statistical 
tests  are  therefore  violated  by  the  "populations"  of  time  scores 
associated  with  the  experiment.  Distributions  I  and  II  therefore  serve 
to  illustrate  the  hazard  of  assuming  normality  or  homogeneity  of  variance 
without  empirical  check,  especially  where  time  scores  are  involved.  It 
is  particularly  important  to  note,  in  this  context,  that  Distributions  I 
and  II  were  in  no  way  "contrived",  but  were  obtained  under  the  conditions 
of  a  perfectly  routine  experiment. 
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Figure  5.  Distribution  II  (histogram)  and  Normal  Distribution 
with  same  Mean,  Variance  and  Area. 


VIOLATION  OF  ASSUMPTIONS  BY  ERROR  SCORES 

The  number,  r,  of  errors  committed  in  N  trials  is  an  error  score.  If 
trials  are  randomly  selected  and  their  outcomes  are  independent,  and  if 
the  probability  of  an  error  on  a  single  trial  is  a  constant,  p,  then  error 
scores  are  binomially  distributed  with  parameters  N  and  p.  The  mean  of 
such  a  distribution  is  Np  and  its  variance  is  Np  (1-p).  The  degree  to 
which  such  distributions  violate  the  normality  assumption  is  simply  the 
degree  to  which  the  normal  approximation  to  the  binomial  distribution  is 
a  poor  fit,  which,  in  turn,  is  a  function  of  the  parameters  N  and  p.  This 
is  illustrated  in  Figure  6. 

As  N  decreases,  the  number  of  different  values  r  can  assume  becomes 
too  small  for  the  discrete  binomial  distribution  to  be  well  approximated 
by  the  continuous  normal  distribution.  The  discrepancy  is  particularly 
important  at  the  tails  of  the  distribution.  In  order  to  "fit"  a  normal 
distributicn  with  the  same  mean  and  variance,  at  its  Lails,  the  binomial 
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ERRORS  ARE  BINOMIALLY  DISTRIBUTED.  THE  NORMAL  APPROXIMATION  IS  POOR  IF  THERE 
IS  AN  APPRECIABLE  PROBABILITY  FOR  EITHER  ZERO  ERRORS  OR  N  ERRORS  IN  N  TRIALS. 
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Figure  6.  Error-Score  Distributions 
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would  have  to  approach  zero  probability  by  a  series  of  fine  gradations  of 
diminishing  discrete,  i,e., point,  probabilities.  This  it  is  unable  to  do 

if  the  number  of  point  probabilities,  i.e.,  bars  in  the  histogram,  is 
small . 

As  p  departs  increasingly  from  .5,  the  binomial  distribution  becomes 
increasingly  asymmetrical.  Thus,  since  the  normal  distribution  is 
symmetrical,  the  assumption  of  normality  is  violated  to  increasing  degree. 

As  N  decreases  and  p  approaches  one  of  its  extremes,  zero  or  one, 
increasingly  substantial  proportions  of  the  binomial  histogram  tend  to 
become  concentrated  at  that  extreme.  This  forces  more  and  more  substantial 
proportions  of  the  fitted,  i.e.,  "assumed",  normal  distribution  to  be 
concentrated  over  error-score  values  which  are,  in  fact,  impossible,  thus 
resulting  in  increasingly  serious  violations  of  the  normality  assumption. 

It  is  clear,  therefore,  that  error  scores  violate  the  parametric 
assumption  of  normality  and  that  the  degree  of  violation  is  likely  to 
become  appreciable  if  either  the  probability  of  an  error  on  a  single  trial 
or  the  number  of  trials  upon  which  the  error  score  is  based  is  small.  If 

both  are  small  the  error  distribution  is  certain  to  be  quite  appreciably 
nonnormal . 

These  conditions  are,  in  fact,  quite  likely  to  obtain  in  practice. 

In  the  previously  referenced  multi-subject  push-button  experiment  (J.)  the 
empirical  probability  of  an  error  on  a  single  trial,  i.e„,  the  obtained 
proportion  of  errors,  ranged  from  .49  to  .02  depending  on  the  experimental 
condition.  (For  errors  defined  as  inadvertent  operation,  rather  than 
touching,  of  adiacent  push  buttons,  empirical  probabilities  ranged  fi.u!u 
.06  to  .00,)  It  is  natural  to  define  as  errors  events  having  low 
probability  of  occurrence,  p.  This,  coupled  with  the  widespread  tendency 
to  program  an  experiment  so  as  to  obtain  from  each  subject  a  small  number 
of  trials,,  N)(  under  each  of  a  large  number  of  treatments*  tends  to  create 
the  conditions  under  which  the  normality  assumption  is  appreciably  violated. 
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DISCUSSION 


Neither  time  scores  nor  errors  satisfy  the  parametric  assumption  of 
normality.  A  normal  distribution  has  infinite  range,  is  symmetrical,  and 
is  continuously  distributed.  None  of  these  properties  is  characteristic 
of  raw,  absolute  time  scores  or  errors: 

Range :  The  normal  distribution  extends  from  minus  infinity  to  plus 

infinity.  Absolute  time  scores,  however,  cannot  be  negative,  and,  in 
fact,  generally  cannot  drop  below  some  positive  value  corresponding  to  a 
physiological  limit  for  the  speed  with  which  the  task  can  be  performed. 
Error  scores,  i.e.,  the  number  of  errors  in  N  trials,  cannot  be  less  than 
zero  nor  greater  than  H,  (assuming  a  maximum  of  one  error  per  trial).  If 
an  appreciable  proportion  of  a  normal  curve  "fitted"  to  a  time  score  or 
error  distribution  covers  values  which  are  in  fact  impossible,  then  the 
normality  assumption  has  been  appreciably  violated. 

Symmetry:  While  it  is  not  impossible  for  time  score  or  error  distributions 
to  be  exactly  symmetrical,  it  is  unlikely.  Time  scores  tend  to  be 
positively  skewed,  presumably  owing  to  the  fact  that  there  is  no  limit 
upon  the  value  which  can  be  assumed  by  scores  above  the  median,  while 
those  below  the  median  must  be  concentrated  between  the  median  and  the 
physiological  limit.  Error  scores  ate  positively  skewed  if  p  is  less  than 
1/2  and  negatively  skewed  if  it  exceeds  1/2,  the  degree  of  skewness  (for 
constant  N)  increasing  with  increasing  values  of  jp  -  1/2 j. 

Continuity:  While  elapsed  time  is  continuously  distributed,  measured  time 
has  a  discrete  distribution  owing  to  the  fact  that  infinite  precision  of 
measurement  is  impossible.  In  the  experiment  reported  herein,  tor  example, 
the  time  clock  was  calibrated  in  hundredths  of  a  second,  thus  giving  a 
discrete  distribution  of  measurements,  time  values  between  points,  i.e., 
hundredths,  being  recorded  as_  "belonging"  to  the  nearest  point. 
(Interpolation  would  merely  increase  the  fineness  of  the  discrete 
gradations,  e.g., substituting  interpolated  thousandths  for  recorded 
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hundredths.)  Usually  the  measuring  instrument  is  capable  of  sufficient 
gradation  to  render  this  criticism  trivial;  however,  there  are  exceptions. 
In  the  case  of  errors,  discontinuity  may  be  a  serious  contributor  to 
degree  of  nonnormality.  The  number  of  errors  in  N  trials  can  assume  orly 
•N  +  1  different  values.  If  N  is  small,  the  point  probabilities  for  these 
N  +  1  values  must  change  by  gross  steps  rather  than  by  the  succession  of 
fine  gradations  which  would  be  necessary  to  approximate  well  the  normal 
curve.  This  is  particularly  important  at  the  tails  of  the  "fitted"  normal 
distribution  where  the  gross  step  is  likely  to  be  from  a  relatively  large- 
value  to  zero.  This  sudden  descent  cannot  be  matched  by  a  corresponding 
drop  in  the  ordinate  of  the  fitted  normal  curve  because  the  normal  curve 
must  approach  zero  probability  asymptotically. 


In  many  cases  the  central  portion  of  a  distribution  is  well 
approximated  by  a  fitted  normal  curve,  the  fit  becoming  increasingly  poor 
as  the  tail  areas  are  approached.  If  the  fit  is  "good"  over  say  90  to 
95%  of.  the  area  covered  by  the  curve,  the  curves  tend,  very  deceptively, 
to  give  the  general  appearance  of  a  good  overall  fit,  thus  tempting  the 
experimenter  to  make  a  false  proclamation  of  "normality".  The  fit  at 
the  tails,  however,  is  of  critical  importance  and  has  not  received  the 
attention  it  deserves.  Extreme,  i.e.,  lail,  values  from  the  hypothesized 
distribution  arc  those  which  contribute  the  most  toward  giving  to  a  test 
statistic  the  extreme  values  which  would  place  it  in  its  rejection  region; 
that  is  to  say,  the  greater  the  number  of  sample  observations  whose  values 
correspond  to  those  in  a  single  tail  of  the  hypothesized  distribution, 
the  more  likely  io  the  test  statistic  to  fall  in  its  rejection  region. 

The  fit  at  the  tails  is  therefore  of  critical  importance  to  the  commission 
of  Type  I,  and,  indirectly,  of  Type  II  errors.  If  the  fit  at  the  tails  is 
"poor"  the  true  probability  of  such  errors  may  differ  greatly  from  their 
nominal  probabilities  read  from  tables  which  were  constructed  under  the 
assumption  of  normality.  As  has  been  shown,  poorness  of  fit,,  i.e.,  extreme 
nonnormality,  at  the  tails  can  result  from  the  presence  of  impossible 
scores  close  to  the  bulk  of  the  distribution,  as  well  as  from  pronounced 
asymmetry  or  limitations  on  the  number  of  different  values  a  score  can  assume, 
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The  preceding  data  and  discussion  have  concerned  distributions  of  scores 
for  a  single  subject.  They  are  relevant  to  multi-subject  populations  (  defined 
here  as  distributions  composed  of  an  infinite  number  of  individual,  i,e.,not 
mean,  scores  each  of  which  was  obtained  from  a  different  subject  )  to  the 
degree  that  the  individual  subjects'  distributions  resemble  each  other  in  form 
and  central  tendency.  Naturally  a  certain  variability  among  individual 
subjects'  distributions  is  to  be  expected  in  both  these  respects,  and  thivs 
variability  rvay  tend  to  make  the  multi-subject  distribution  more  nearly  normal 
than  the  distributions  for  the  subjects  as  individuals.  For  example,  suppose 
each  subject's  distribution  were  identical  in  form  to  Distribution  I,  but 
different  in  location,  i.e,,mean.  If  the  means  all  fell  within  a  range  of  a 
few  hundredths  of  a  second,  the  multi-subject  distribution,  although  perhaps 
more  nearly  bell-shaped  than  the  individual  subjects'  distributions,  would 
still  be  bimodal  with  a  fairly  sharply  defined  trough  between  modes.  However, 
if  the  subjects'  true  means  were  evenly  distributed  over  a  range  of  25 
hundredths  of  a  second,  the  troughs  for  some  subjects  would  correspond  to 
modes  for  other  subjects,  with  the  result  that  the  multi-subject  distribution 
would  tend  to  be  unimoual  and  somewhat  less  skewed.  Whether  or  not  a  better 
approximation  to  normality  is  obtained  in  proceeding  from  a  "typical"  one- 
subject  distribution  to  a  multi -subject  distribution  would  appear  to  depend 
roughly  upon  the  relative  variance  and  shape  of  the  distribution  of  individual 
subjects'  true  means  with  respect  to  the  typical  one-subject  distribution  of 
individual  scores.  If  the  shape  of  the  former  is  more  nearly  normal  or  if  its 
variance  is  much  smaller  than  that  of  the  latter,  then,  with  infrequent 
exceptions,  one  would  expect  the  multi-subject  distribution  to  be  more  nearly 
normal  than  the  typical  one-svbject  distribution. 

While  these  consideratior s  should  not  be  discounted,  neither  should  they 
be  weighted  too  heavily.  All  of  the  previous  comments  relative  to  range  and 
continuity  apply  with  equal  force  to  multi-subject  distributions.  (  Although 
the  range  of  time  scores  is  undoubtedly  greater  in  the  multi-subject 
distribution,  the  comments  continue  to  apply  if  "physiological  limit"  is  now 
understood  to  refer  to  the  fastest  possible  time  by  any  subject.)  A  greater 
degree  of  symmetry  might  frequently  be  expected  in  the  central  portion  of  a 
multi-subject  distribution.  However,  the  presence  of  impossible  scores 
within  a  few  standard  deviations  of  the  median  and  on  one  side  of  it,  or 
unequally  distant  from  it,  will  still  tend  to  insure  asymmetry  at  the  tails. 
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The  Jain  pt t ocTited  in  the  body  and  appendix  of  this  report  Illustrate 
violations  of  two  other  parametric  assumptions,  homogeneity  of  variance  and 
independent  observations,  i.e.  uncorrelated  scores.  Heterogeneity  of 
variance  exists  among  both  the  time-score  and  error-score  distributions,  and 
scores  of  both  time-score  distributions  are  sequentially  correlated. 

The  assumption  of  homogeneity  of  variance,  in  tests  for  equality  of 
means,  is  a  particularly  frustrating  one.  Variances  may  be  unequal  when 
the  null  hypothesis  of  equal  means  is  true,  but  they  are  particularly 
likely  to  be  so  when  the  null  hypothesis  is  false,  due  to  the  fact  that  in 
many  cases  means  and  variances  are  positively  correlated.  Heterogeneity  of 
variance,  therefore,  tends  to  suggest  that  the  null  hypothesis  is  false.  The 
experimenter,  however,  does  not  "know81  whether  or  not  means  are  unequal  until 
he  performs  the  statistical  test,  and  this  he  cannot  do  so  long  as  variances 
are  heterogeneous.  Techniques,  of  course,  are  available  for  coping  with  this 
situation;  however,  they  are  cumbersome  at  best.  The  livelihood  of  correlation 
between  means  and  variances  is  intuitively  obvious  in  the  case  of  time  scores: 
the  longer  the  time  required  to  perform  a  given  type  of  task,  the  greater  its 
variability  would  be  expected  to  be.  This,  in  fact,  vas  the  case  for  the  time 
score  distributions,  I  and  II,  presented  earlier.  In  the  case  of  binomially 
distributed  errors,  correlation  between  mean,  Np,  and  variance,  Np(l-p),  is 
inevitable  if  cither  N  or  p  Is  held  constant,  and  is  quits  likely  in  ali  cases. 
(See  Figure  6  ) 

An  assumption  made  by  nearly  all  statistical  tests,  whether  parametric 
or  distribution-free,  is  that  observations  are  independent,  i.e., that  the 
outcome,  or  score  obtained,  from  one  trial  is  not  influenced  by  that  of  any 
preceding  trial.  When  more  than  one  score  is  obtained  from  a  single  subject, 
the  assumption  of  independence  generally  implies  that  there  must  be  no 
sequential  effects,  i.e,,  no  learning,  no  fatigue  and  no  motivational 
fluctuations  such  as  would  be  caused  by  boredom.  Experiences  of  the  writer 
and  his  colleagues  suggest  that  this  assumption  is  never  fully  met  when 
repetitive  measurements,  subject  to  such  sequential  effects,  are  taken  upon 
a  single  subject.  After  thousands  of  "practice'7  trials  the  subject  is  still 
learning,  and  by  the  time  he  has  received  that  much  practice,  boredom  end 
fluctuations  of  attention  are  likely  to  have  appreciaole  influence  upon  his 
performance.  Lack  of  independence  due  to  such  sequential  effects  can,  of 
course,  be  avoided  by  using  only  one  score  from  each  subject,  different  and 
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nonover lapping  groups  of  subjects  being  used  under  the  various  experimental 
conditions. 


CONCLUSION 

The  parametric  assumptions  of  normal  distributions,  equal  variances 
and  uncorrelated  scores  are  particularly  susceptible  to  violation  by 
measurements  typical  of  research  in  experimental  psychology.  The  extent 
of  the  violation  may  be  small,,  or,  in  the  case  of  the  last  two  assumptions, 
there  may  be  no  violation.  However,  drastic  violations  may  occur  quite 
naturally  as  a  logical  consequence  of  entirely  realistic  experimental 
conditions,  and  such  extreme  violations  are  not  at  all  uncommon. 
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APPENDIX  A 

STATISTICAL  TRENDS  IN  THE  GENERATION  OF  THE  ORIGINAL  POPULATIONS 
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For  both  distributions  sequential  effects  were  tested  by  iteans  of  Cox  and  Stuart's  S2 
sign  test  for  trend,  applied  to  the  entire  distribution.  For  Distribution  I, 

Pr  <  10  for  Distribution  II,  Pr  <  10 
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APPENDIX  C 
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